SimCTC: A Simple Contrast Learning Method of Text Clustering (Student Abstract)
نویسندگان
چکیده
This paper presents SimCTC, a simple contrastive learning (CL) framework that greatly advances the state-of-the-art text clustering models. In pre-trained BERT model first maps input sequence to representation space, which is then followed by three different loss function heads: Clustering head, Instance-CL head and Cluster-CL head. Experimental results on multiple benchmark datasets demonstrate SimCTC remarkably outperforms 6 competitive methods with 1%-6% improvement Accuracy (ACC) 1%-4% Normalized Mutual Information (NMI). Moreover, our also show performance can be further improved setting an appropriate number of clusters in cluster-level objective.
منابع مشابه
Silhouette + attraction: A simple and effective method for text clustering
This article presents Sil-Att, a simple and effective method for text clustering, which is based on two main concepts: the silhouette coefficient and the idea of attraction. The combination of both principles allows us to obtain a general technique that can be used either as a boosting method, which improves results of other clustering algorithms, or as an independent clustering algorithm. The ...
متن کاملClustering Student Learning Activity Data
We show a variety of ways to cluster student activity datasets using different clustering and subspace clustering algorithms. Our results suggest that each algorithm has its own strength and weakness, and can be used to find clusters of different properties. 1 Background Introduction Many education datasets are by nature high dimensional. Finding coherent and compact clusters becomes difficult ...
متن کاملA Supervised Clustering Method for Text Classification
This paper describes a supervised three-tier clustering method for classifying students’ essays of qualitative physics in the Why2-Atlas tutoring system. Our main purpose of categorizing text in our tutoring system is to map the students’ essay statements into principles and misconceptions of physics. A simple `bag-of-words’ representation using a naïve-bayes algorithm to categorize text was un...
متن کاملA Simple Text-line segmentation Method
Text line segmentation is an important step because inaccurately segmented text lines will cause errors in the recognition stage.. The nature of handwriting makes the process of text line segmentation very challenging. Text characteristics can vary in font, size, orientation, alignment, color, contrast, and background information. These variations turn the process of word detection complex and ...
متن کاملLearning To Identify Student Preconceptions From Text
Automatic classification of short textual answers by students to questions about topics in physics, computing, etc., is an attractive approach to diagnostic assessment of learning. We present a language for expressing rules that can classify text based on the presence and relative positions of words, lists of synonyms and other abstractions of a single word. We also describe a system, based on ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence
سال: 2022
ISSN: ['2159-5399', '2374-3468']
DOI: https://doi.org/10.1609/aaai.v36i11.21635